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Abstract — We describe and analyze sparse graphical code 
constructions for the problems of source coding with decoder 
side information (the Wyner-Ziv problem), and channel coding 
with encoder side information (the Gelfand-Pinsker problem). 
Our approach relies on a combination of low-density parity check 
(LDPC) codes and low-density generator matrix (LDGM) codes, 
and produces sparse constructions that are simultaneously good 
as both source and channel codes. In particular, we prove that 
under maximum likelihood encoding/decoding, there exist low- 
density codes (i.e., with finite degrees) from our constructions that 
can saturate both the Wyner-Ziv and Gelfand-Pinsker bounds. 

I. Introduction 

Sparse graphical codes, particularly low-density parity 
check (LDPC) codes, are widely used and well understood 
in application to channel coding problems [10]. For other 
communication problems, especially those involving aspects 
of both channel and source coding, there remain various 
open questions associated with using low-density code 
constructions. Two important examples are source coding 
with side information (the Wyner-Ziv problem), and channel 
coding with side information (the Gelfand-Pinsker problem). 
This paper focuses on the design and analysis of low- 
density codes — more specifically, constructions based on 
a combination of LDPC and low-density generator matrix 
(LDGM) codes — for source and channel coding with side 
information. It builds on our previous work [7], in which we 
proved that low-density constructions and ML decoding can 
saturate the rate-distortion bound for a symmetric Bernoulli 
source. 

Related work: It is well-known that random constructions of 
nested codes can saturate the Wyner-Ziv and Gelfand-Pinsker 
bounds [13], [15]. However, an unconstrained random 
construction leads to a high-density code, which is of 
limited practical use. One practically viable approach to 
lossy compression is trellis coded quantization (TCQ) [6]. 
A number of researchers have exploited TCQ as a quantizer 
for the Wyner-Ziv and related multiterminal source coding 
problems [2], [14] as well as for channel coding with 
side information []. A disadvantage of TCQ is that 
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saturating rate-distortion bounds requires that the trellis 
constraint length be taken infinity [11]; consequently, the 
computational complexity of decoding, even using message- 
passing algorithms, grows exponentially. It is therefore 
of considerable interest to develop low-density graphical 
constructions for such problems. Past work by a number of 
researchers [8], [12], [3], [9] has suggested that LDGM codes, 
which arise as the duals of LDPC codes, are well-suited to 
various types of quantization. 

Our contributions: In this paper, we describe a sparse graphi- 
cal construction for generating nested codes that are simultane- 
ously good as both source and channel codes. We build on our 
previous work [7], in which we analyzed constructions, based 
on a combination of LDPC and LDGM codes, for the problem 
of standard lossy compression. Here we prove that there exist 
variants of these joint LDPC/LDGM constructions with finite 
degrees such that, when decoded/encoded using maximum 
likelihood, can saturate the Wyner-Ziv and Gelfand-Pinsker 
bounds. Although ML decoding is not practically viable, the 
low-density nature of our construction means that they have 
low degree, and with high probability (w.h.p.) high girth and 
expansion, all of which are important for the application of 
efficient message-passing. 

The remainder of this paper is organized as follows. 
Section HH provides background on source coding with side 
information (SCSI, or the Wyner-Ziv problem), and channel 
coding with side information (CSSI, or the Gelfand-Pinsker 
problem). Section |H3] introduces our joint LDGM/LDPC 
construction, and provides a high-level overview of its use 
for the SCSI and CCSI problems. In Section IIVI we prove 
that our construction produces codes that are simultaneously 
"good" for both source and channel coding. We conclude 
with a discussion in Section IV! 

Notation: Vectors/sequences are denoted in bold (e.g., s), 
random variables in sans serif font (e.g., s), and random vec- 
tors/sequences in bold sans serif (e.g., s). Similarly, matrixes 
are denoted using bold capital letters (e.g., G) and random 
matrixes with bold sans serif capitals (e.g., G). We use /(•; •), 
H(-), and to denote mutual information, entropy, 

and relative entropy (Kullback-Leibler distance), respectively. 
Finally, we use card{-} to denote the cardinality of a set, 



|| • Hp to denote the p-norm of a vector, Ber(i) to denote a 
Bernoulli-i distribution, and H b (t) to denote the entropy of a 
Ber(t) random variable. 

II. Background 

A. Source and channel coding 

We begin with definitions of "good" source and channel 
codes that are useful for future reference. 

Definition 1. (a) A code family is a good Z?-distortion binary 
symmetric source code if for any e > 0, there exists a code 
with rate R < 1 — H b (D) + e that achieves distortion D. 
(b) A code family is a good BSC(p)-noise channel code if for 
any e > there exists a code with rate R > 1 — Hi, (p) — e 
with error probability less than e. 

B. Wyner-Ziv problem 

Suppose that we wish to compress a symmetric Bernoulli 
source s ~ Bcr(i) so as to be able to reconstruct it with 
Hamming distortion D. By classical rate distortion theory [4], 
the minimum achievable rate is given by R(D) = 1 — H b (D). 
In the Wyner-Ziv extension [13], there is an additional source 
of side information about s — say in the form y = s © w where 
w ~ Bcr(<5) is observation noise — that is available only at the 
decoder. In this setting, the minimum achievable rate takes the 
form R wz (D,p)=l.c.e.{H b (D*p)-H b (D), (p,0)}, 
where 1. c. e. denotes the lower convex envelope. Note that 
in the special case p = |, the side information is useless, so 
that the Wyner-Ziv rate reduces to classical rate-distortion. 

C. Gelfand-Pinkser problem 

Now consider the binary information embedding problem: 
the channel has the form y = u © s z, where u is the 
channel input, s is a host signal (not under control of the 
encoder), and z ~ Ber(p) is channel noise. The encoder is 
free to choose the input vector u G {0, 1}™, subject to the 
channel constraint ||u||i < wn, so as to maximize the rate 
of information transfer. We write u = u m where m is the 
underlying message to be transmitted. The decoder wishes to 
recover the embedded message from the corrupted observation 
y. It can be shown [1] that the capacity in this set-up is given 
by Rm{w,p) u. c. e. {iJf, (w) — H b (p) , (0, 0)}, where u. c. e. 
denotes the upper convex envelope. 

III. Generalized Compound Construction 

In this section, we describe a compound construction that 
produces codes that are simultaneously "good", in the senses 
previously defined, as source and channel codes. We then 
describe how the nested codes generated by this compound 
construction apply to the SCSI and CCSI problems. 

A. Code construction 

Consider the compound code construction illustrated 
in Fig. ^ defined by a factor graph with three layers. The top 
layer consists of n bits, each attached to an associated parity 
check. These parity checks connect to m variable nodes in 
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Fig. 1. Illustration of compound LDGM and LDPC code 
construction. The top section consists of an (n, m) LDGM 
code with generator matrix G and constant check degrees 
7t = 4; its rate is -R(G) = — . The bottom section 
consists of (m,ki) and (m, fcj) LDPC codes with degrees 
(7-0, 7c) = (3,6), described by parity check matrices Hi 
and H 2 and rates fl(Hi) = 1 - ^ and R(H 2 ) = 1 - ^ 
respectively. The overall rate of the compound construction 
is Rcoxn = R(G)R(H)), where R(U) = i?(Hi) + i?(H 2 ). 

the middle layer, and in turn these middle variable nodes are 
connected to k = ki + k 2 parity checks in the bottom layer. 
Random LDGM ensemble: The top two layers define an (n, m) 
LDGM code. We construct it by connecting each of the n 
checks at the top randomly to j t variable nodes in the middle 
layer chosen uniformly at random. We use G e {0,1}" 1X ™ 
to denote the resulting generator matrix; by construction, 
each column of G has exactly 7 t ones, whereas each row 
(corresponding to a variable node) has an (approximately) 
Poisson number of ones. An advantage of this regular-Poisson 
degree ensemble is that the resulting distribution of a random 
codeword is extremely easy to characterize: 

Lemma 1. Let G S {0,l} mx ™ be a random generator 
matrix obtained by randomly placing 7t ones per column. 
Then for any vector w G {0,1}™ with a fraction of v 
ones, the distribution of the corresponding codeword w G is 
Bernoulli(S(v;j t )) where 

<J(«;7t) = 5-[l-(l-2«)'*]. (1) 

Random LDPC code: The bottom two layers define a pair of 
LDPC codes, with parameters (m, fci) and (in, fe); we choose 
these codes from a standard standard (7^, 7 c )-regular LDPC 
ensemble originally studied by Gallager. Specifically, each of 
the m variable nodes in the middle layer connects to j v check 
nodes in the bottom layer. Similarly, each of the k check nodes 
in the bottom layer connects to 7 C variable nodes in the middle 
layer. For convenience, we restrict ourselves to even check 
degrees j c . Dividing the k check bits into two subsets, of 
size fci and k 2 with respective parity check matrices Hi and 
H2, allows for the construction of nested codes, which will be 
needed for both the Wyner-Ziv and Gelfand-Pinsker problems. 

B. Good source and channel codes 

The key theoretical properties of this joint LDGM/LDPC 
construction are summarized in the following results: 

Theorem 1 (Good source code). With appropriate finite 
degrees, there exist (n, m, k) constructions that are D-good 
source codes for all rates above R(D) = 1 — H b (D). 



Theorem 2 (Good channel code). With appropriate finite 
degrees, there exist (n, m, k) constructions that are good p- 
channel codes for all rates below capacity C = 1 — H b (p). 

Theorem ^ on source coding was proved in our previous 
work [7], whereas a proof of Theorem[2]is given in Section HVl 
We now describe how these two theorems allow us to establish 
that our low-density construction achieves the Wyner-Ziv and 
Gelfand-Pinsker bounds. At a high level, our approach is 
closely related to standard approaches to SCSI/CSCI coding; 
the key novelty is that appropriately nested codes can be 
construction using low-density architectures. 

C. Coding for Wyner-Ziv 

We focus only on achieving rates of the form H b (D * p) — 
H b (D), as any remaining rates on the Wyner-Ziv curve can 
be achieved by time-sharing with the point (p, 0). To do this, 
we use the compound code in Fig. [2 Specifically, a source 
s is encoded to H2W where w is chosen to minimize the 
distortion ||s — w'G||i subject to the constraint that Hiw = 0. 
Theorems ^ an d LH show that maximum likelihood decoding 
of H2W using side information y approaches the Wyner-Ziv 
bound in the sense that this construction yields a good D- 
distortion binary source code, and a nested subcode that is a 
good D * p-noise channel code. Details follow. 
Source coding component: The D— distortion source code 
component of the construction involves the n variable nodes 
representing the source bits, the m intermediate variable nodes, 
and the subset of k\ lower layer check nodes. This subgraph, 
represented by the generator matrix G and parity check matrix 
Hi (see Fig. define a code (on the n variable nodes) with 
effective rate 



Ri 



(1- -) 

V m / 



(2) 



Choosing the middle and lower layer sizes m and k\ such that 
Ri = 1 — H b (D) guarantees (from Theorem Q the existence 
of finite degrees such that that this code is a good D-distortion 
source code. 

Channel coding component: Now suppose that the source s 
has been quantized, and is represented (up to distortion D) 
by the compressed sequence x € {0, l} m . We transmit the 
associated sequence H2X 6 {0,l} fc2 of parity bits associated 
with the code H2; doing so requires rate i?trans = — • The 
task of the decoder is as follows: given these &2 parity bits 
as well as the k\ zero-valued parity bits, the decoder seeks to 
recover the quantized sequence x on the basis of the observed 
side-information y. Note that from the decoder's perspective, 
the effective code rate is given by 

TYL — k\ — k-2 



R 2 = 



(3) 



Suppose that we choose k^ such that R2 = 1 — H b {D *p); 
then Theorem 12 guarantees that the decoder will (w.h.p.) be 
able to recover a codeword corrupted by (D * p) -Bernoulli 
noise. Note that the side information can be written as 
y = s e © v, where e : = s © s is the quantization noise, 



and v ~ Bcr(p) is the channel noise. If the quantization noise 
e were i.i.d. Ber(_D), then the overall effective noise e © v 
would be i.i.d. Ber(Z) *p). In reality, the quantization noise 
is not exactly i.i.d. Ber(D), but it can be shown [15] that it 
can be treated as such for theoretical purposes. 

In summary then, the overall transmission rate of this 
scheme for the Wyner-Ziv problem is given by 

Tn — fci \ / m — k\ — kn \ , „ s , „s 

1 ' ' " ' = H b (D*p)-H b (D). (4) 



Thus, by applying Theorems [2 and |2j we conclude that our 
low-density scheme saturates the Wyner-Ziv bound. 

D. Coding for Gelfand-Pinsker 

The construction for the Gelfand-Pinsker problem is similar, 
but with the order of the code nesting reversed. In particular, 
the Gelfand-Pinsker problem requires a good p-noise channel 
code, and a nested subcode that is a good w-distortion source 
code. As before, we focus only on achieving rates of the 
form Hi, (w) — Hi, (p). To encode a message m with side 
information y, the channel input is w'G where w is chosen 
to minimize ||y— w'G||i subject to Hiw = m. Details follow. 
Source coding component: We begin by describing the 
nested subcode for the source coding component. The idea 
is to embed a message into the transmitted signal during the 
quantization process. The first set of k\ lower parity bits 
remain fixed to zero throughout the scheme. On the other 
hand, we use the remaining fc 2 lower parity bits to specify a 
particular message m £ {0, l} fe2 that the decoder would like 
to recover. With the lower parity bits specified in this way, we 
use the resulting code to quantize a given source sequence s 
to a compressed version s. If we choose n, m and k such that 

TTt — k\ — &2 



Ri = 



= 1-H b (w), 



(5) 



then Theorem [J guarantees that the resulting code is a good 
w-distortion source code. Otherwise stated, we are guaranteed 
that w.h.p, the error e : = s © s in our quantization has 
Hamming weight upper bounded by wn. Thus, transmitting 
the error e ensures that the channel constraint is met. 
Channel coding component: At the decoder, the ki lower 
parity bits remain set to zero; the remaining k-2 parity bits, 
which represent the message m, are unknown to the coder. 
We now choose ki such that the effective code used by the 
decoder has rate 

-fci 



R2 = 



= 1 - H b (p) . 



(6) 



In addition, the decoder is given a noisy channel observation 
of the form y = effis©v = s©v and its task is to recover 
s. With the channel coding rate chosen as in equation © and 
channel noise v ~ Ber(p), Theorem |2 guarantees that the 
decoder will w.h.p. be able to recover If. By design of the 
quantization procedure, we have the equivalence m = s H2 so 
that a simple syndrome-forming procedure allows the decoder 
to recover the hidden message. Thus, by applying Theorems ^ 
and |3 we conclude that our low-density scheme saturates the 
Gelfand-Pinsker bound under ML encoding/decoding. 



IV. Proof of Theorem|2] 

As described in the previous sections, Theorems \l\ and [2] 
allow us to establish that the Wyner-Ziv and Gelfand-Pinsker 
bounds can be saturated under ML encoding/decoding. The 
source coding part — namely Theorem [2 — was proved in our 
earlier work [7]. Here we provide a proof of Theorem [2] 
which ensures that these joint LDGM/LDPC constructions 
are good channel codes. We consider a joint construction, as 
illustrated in Fig. [2 consisting of a rate R(G) LDGM top 
code, and a rate i?(H) lower LDPC code. Recall that the 
overall rate of this compound construction is given by R CO m = 
R(G)R(H). Note that an LDGM code on its own (i.e., without 
the lower LDPC code) is a special case of this construction 
with R(R) = 1. However, a standard LDGM of this variety 
is not a good channel code, due to the large number of low- 
weight codewords. Essentially, the following proof shows that 
using a non-trivial LDPC lower code (with i?(H) < 1) can 
eliminate these troublesome low-weight codewords. 

If the codeword c is transmitted, then the receiver observes 
y = c © v where v is a Ber(p) random vector. Our goal is to 
bound the probability that maximum likelihood (ML) decoding 
fails where the probability is taken over the randomness in 
both the channel noise and the code construction. To simplify 
the analysis, we focus on the following sub-optimal (non-ML) 
decoding procedure: 

Definition 2 (Decoding Rule:). With threshold d(n) := (p + 
n _1 / 3 )n, decode to codeword Ci •<=>■ ||cj ©y||i < din), 
and no other codeword is within d(n) of y. 

(The extra factor of n -1 / 3 in the threshold d(n) is of theoreti- 
cal convenience.) Due to the linearity of the code construction, 
we may assume without loss of generality that the all zeros 
codeword 0" was transmitted (i.e., c — 0"). In this case, 
the channel output is simply y = v and so our decoding 
procedure will fail if and only if either (i) ||v||i > d(n), 
or (ii) there exists some codeword "middle layer codeword" 
z G {0, l} m satisfying the parity check equation 1 H z = and 
corresponding to a codeword = z G such that ||z G©v||i < 
d(n). Using the following two lemmas, we establish that this 
procedure has arbitrarily small probability of error, whence 
ML decoding (which is at least as good) also has arbitrarily 
small error probability. 

Lemma 2. The probability of decoding error vanishes asymp- 
totically provided that 

R(G)A(v)-D(p\\5(v;>yt)*p) < for all v € (0, §] (7) 

where A(v) := lim TO _>+oo A m (v) is the asymptotic log- 
domain weight numerator of the LDPC code, with A m (v) 
being the average log-domain weight enumerator defined as 



1 



A m (v) :— — logEcard {z | ||z||i = um}. 



in 



(8) 



'To be more precise, for the channel decoding step of the Wyner-Ziv 
problem, the middle layer codeword must satisfy Hi z = and H2 z = m 
where m is the output of the Wyner-Ziv encoder. For the channel decoding 
step of the Gelfand-Pinsker problem, the middle layer codeword must only 
satisfy Hi z = 0, since m is unknown until decoding is complete. 



Proof. Let N = 2 nRoom denote the total number of codewords 
in the joint LDGM/LDPC code. Then we can upper bound the 
probability of error using the union bound as follows: 



Perr <P[|M|l > d(n)} 



N 

E 1 

i=2 



'[HziGevlli <d(n)]. (9) 



By Bernstein's inequality, the probability of the first error 
event vanishes for large n. Now focusing on the second sum, 
let us condition on the event that ||z||i = I. Then Lemma ^ 
guarantees that zG has i.i.d. Ber(#(— ;7t)) elements, so that 
the vector zG v has i.i.d. Ber(5(— ; 7 t ) * p) elements. 
Applying Sanov's theorem yields the upper bound 

P[||zGev[[i >d(n) I ||z||i =£] < 2 - nD ( pm ^^* p ) . 
We can then upper bound the second error term (|9) via 

m 
1=0 

{«flcom+m[^ m (^.)-_R(H)] -nD(p\\S(±- nt )*p)} 
o(p||5(^; 7t )*p)} 



= £2 

1=0 



e=o 



= £2 

1=0 



n{R(G){A m {£)-A{£)+A(£)]-D(p\\6(£- nt )*p)} 



< 



2 n{BiG)\A m (£)-A(^)\ + +R(G)A(^)-D(p\\S(^ nt )*p)} 



e=o 

where we have replaced R com = R(G) with i?(H) in the third 
line and used the notation |x| + to denote max(0, x). Finally, 
we notice that by the definition of the asymptotic weight 
enumerator, A(v), the |.4 TO (/u) — A(v)\ + term converges to 
zero uniformly 2 for v G [0, 1] leaving only the error exponent 
(0, which is negative by assumption. □ 

Lemma 3. For any p G (0, 1) and total rate R com : = 
R(G) i?(H) < 1 — Hi, (p), it is possible to choose the code 
parameters j%, j c and j v such that @ is satisfied. 

Proof. For brevity, let F(v) = R(G)A(v) — D (p\\S(v;jt) *p). 
It is well-known that a regular LDPC code with rate 
i?(H) = < 1 has linear minimum distance; in particular, 
there exists a threshold v* = v*(-f v ,j c ) such that A(v) < 
for all v G [0, v*]. Hence, for v G (0, v*), we have F(v) < 0. 
Turning now to the interval [v*, |], consider the function 

G(v) : = R com H b (v) - D (p| \6(v; 7t )) . 

Since A(v) < R(H)H b (v), we have F(v) < G(v), so that it 



suffices to upper bound G. Observe that = R c 



(1 



Hjj (p)) < 0. Therefore, it suffices to show that, by appropriate 
choice of 7 t , we can ensure that G(v) < G(^). Noting that 
G is infinitely differentiable and taking derivatives (details 

2 The definition of A(v) implies pointwise convergence of |»4m(f) — 
A(v)\ + for v € [0)1]. But since the domain is compact, pointwise 
convergence implies uniform convergence. 
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Fig. 2. Plots of different terms in error exponent Q. The combined curve must remain negative for all uj in order for the error 
probability to vanish asymptotically, (a) A LDGM y t — 4 construction without any LDPC lower code: here the weight enumerator 
A is given by Hb (u), and it dominates the Kullback-Leibler term for low uj. (b) The same y t = 4 LDGM combined with a 
(Td, 7c) = (3, 6) LDPC lower code: here the LDPC weight enumerator is dominated for all uj by the KL error exponent. 



omitted), it can be shown that G'{\) = and G"(~) < 0. 
Hence, a second order Taylor series expansion yields that 
G(v) < G(^) for all v £ (/i, ^] for some fi < |. It remains 
to bound G on the interval [^*,^]. On this interval, we have 
G(v) < R com H b {p)-D (p||J(i/*;7 t )). By examining (JTJ, we 
see that choosing 7t sufficiently large will ensure that on the 
interval [v* , fi], the RHS is less than R com — (1 — Hb (p)) as 
required. □ 

Theorem |3 follows by combining the previous lemmas. 

At first glance, Lemma [5] may seem unsatisfying, since it 
might require a very large top degree Note, however, that 
this degree does not depend on the block length, hence our 
claim that good low density codes can be constructed with 
finite degree. Of course, for the claim of finite degree codes 
to be practically meaningful, the degree required for j t should 
be reasonably small. To investigate this issue, we plot the 
error exponent for rate R com — 0.5, LDGM top degree 
7t = 4, and different choices of lower code with i?(H) in 
Figure |3 Without any lower LDPC code, then i?(H) = 1 and 
the effective asymptotic weight enumerator is simply Hb(uj). 
Panel (a) shows the behavior in this case: note that the error 
exponent exceeds zero in a region around v = where 
the weight enumerator dominates the negative KL term. In 
contrast, panel (b) shows the case of a ("f v ,Jc) — (3,6) 
LDPC code, where we have used the results of Litsyn and 
Shevelev [5] in plotting the asymptotic weight enumerator. 
This code family has a linear minimum distance, so that the 
log-domain weight enumerator is negative in a region around 
v = 0. Thus, the error exponent |7) remains negative for all 
v e [0, 0.5]. Thus, provided that a (3, 6) lower LDPC code is 
used, a very reasonable top degree of 7 t = 4 is sufficient. 

V. Discussion 

We have established that sparse graphical constructions that 
exploit both LDGM and LDPC codes can saturate fundamental 
bounds for problems of source coding with side informa- 
tion, and channel coding with side information. Although the 
present results are based on ML encoding/decoding, the spar- 
sity and graphical structure of our constructions render them 



suitable candidates for practical message-passing schemes, 
which remains to be investigated in future work. 
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